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Heart failure (HF) is a global health threat, requiring urgent research in its 
classification. This study proposes a novel approach for HF classification by 
integrating advanced supervised learning (ASL) and particle swarm 
optimization (PSO). ASL techniques like bagging and AdaBoost are 
employed within the PSO+ASL optimization model to enhance prediction 
accuracy. PSO optimizes model weights and bias, while ASL addresses 
overfitting or underfitting issues. Split validation and cross-validation 
(70:30, 80:20, 90:10 with k-fold=10) are used for further optimization. The 
testing phase involves 12 classifiers in five groups: decision tree models 
(DTM), support vector machines (SVM), Naive Bayes classifiers models 
(NBCM), logistic regression models (LRM), and lazy model (LM). 
Evaluating the proposed approach with an HF patient dataset from 
https://www.kaggle.com, results are compared against the standard model, 
PSO optimization, and PSO+ASL. Experimental findings demonstrate the 


superiority of the proposed approach, achieving higher accuracy in HF 
prediction. The PSO+ASL optimization model with the k-nearest neighbor 
(k-NN) method exhibits the best classification performance. It consistently 
achieves the highest accuracy across all tests on dataset composition ratios, 
with 100% accuracy, f-measure, sensitivity, specificity values, and area 
under cover (AUC) of 1. The proposed approach serves as a reliable tool for 
early detection and prevention of HF. 
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1. INTRODUCTION 

Heart failure (HF) is a medical condition that is characterized by a complex set of symptoms rather 
than a specific disease [1]. It occurs when the ventricle struggles to fill or empty with blood, making it 
challenging for the heart to meet the body’s circulation needs. Common symptoms include shortness of 
breath, swollen ankles, and fatigue, while signs such as high jugular venous pressure, pulmonary crackles, 
and peripheral edema may also be present, indicating structural and/or functional cardiac or non-cardiac 
abnormalities [2], [3]. In Indonesia, heart disease is the leading cause of death, and HF represents a 
significant portion of these cases [4]. Approximately 5% of the country’s population is estimated to suffer 
from HF [5]. Furthermore, the fatality rate is significant, with up to 17.2% of all HF patients dying during 
their initial hospitalization, regardless of a history of heart attacks. Additionally, 11.3% of patients died 
within a year of starting treatment, while another 17% required repeated hospitalizations due to worsening 
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HF. These patients are typically hospitalized at least once a year after diagnosis, with an average age of 58. 
Data from the Basic Health Research Data (Riskesdas) for 2013 and 2018 show an increasing trend in heart 
disease, rising from 0.5% in 2013 to 1.5% in 2018. Heart disease, including HF, is associated with significant 
healthcare costs, with IDR 7.7 trillion spent on it in 2021, according to data from the Social Security 
Administering Body for Health (BPJS). These statistics emphasize the importance of early detection and 
treatment of HF. Traditional diagnosis of HF relies on the patient’s medical history, physical tests, and the 
doctor’s examination of related symptoms [3], [6]. Angiographic techniques are one of the most reliable 
conventional methods for diagnosing HF [7]. However, this method requires specialized expertise and comes 
with a high cost and potential side effects [8]. 

While there have been efforts to achieve high predictive performance and identify relevant risk 
factors associated with HF, the emergence of artificial intelligence (AI) tools and machine learning (ML) 
algorithms [9], [10] in recent years has provided powerful diagnostic aids [11]. These tools can extract 
knowledge from large amounts of data, which may be difficult or impossible for humans to achieve [12], [13]. 
By employing ML-based decision-making approaches, doctors can detect the risk of HF and provide 
necessary treatments and recommendations to manage these risks [14]. Early detection and treatment using 
ML techniques have the potential to significantly improve patient survival rates. Consequently, several studies 
have utilized ML for the diagnosis [15]-[19] and prediction of HF, such as determining the likelihood of a 
patient having a disease history that may cause HF, such as hypertension, diabetes, or hyperlipidemia [20]-[23]. 
Various classification algorithms, including decision trees [24]—[26], support vector machines (SVM) [27], 
Naive Bayes [28], and neural networks [29] have been used for HF prediction. Despite these efforts, 
accurately predicting HF remains a significant challenge. Comparison and benchmarking results of ML 
classifiers have shown no significant differences in performance [30], and no single classifier has proven to 
be the best for all datasets. 

Our study aims to address the existing gap in accurately predicting heart failure using machine 
learning techniques. Despite various efforts, no single classifier has proven to be the best for all datasets. In 
this research, we present a novel approach that incorporates advanced supervised learning (ASL) [29]-[31] 
and particle swarm optimization (PSO) [32], [33] techniques to optimize classification results. Moreover, we 
employed split and cross-validation techniques with varying composition ratios of 70:30, 80:20, and 90:10, 
using k-fold=10, and tested twelve classifiers sorted into five groups: decision tree models (DTM), SVM, 
Naive Bayes classifier models (NBCM), logistic regression models (LRM), and lazy models (LM). The 
selection of these classifiers was based on several considerations. Firstly, previous studies have shown that 
various classification algorithms, such as decision trees [22]-[24], SVM [25], Naive Bayes [26], and neural 
networks [27], have been used for HF prediction. These algorithms have demonstrated their effectiveness in 
handling complex datasets and have been widely employed in HF research. Secondly, the rationale behind 
choosing multiple classifiers lies in the understanding that no single classifier has proven to be the best for all 
datasets or consistently outperforms others. Comparison and benchmarking results of ML classifiers have 
shown no significant differences in performance [28]. Therefore, by employing a diverse set of classifiers, 
the paper aims to explore the strengths and weaknesses of each algorithm and identify the most suitable 
classifiers for HF classification. By evaluating the PSO and ASL algorithms on 12 classifiers grouped into 
five categories, this study aims to assess the strengths and weaknesses of each classifier and determine the 
most appropriate one for HF classification. This research makes a significant contribution by offering a more 
precise approach to diagnosing heart failure, leading to early detection and improved patient outcomes. 
Furthermore, our findings can guide future research endeavors aimed at enhancing the diagnosis and 
treatment of heart failure. The integration of AI and ML techniques [31], [32] in healthcare holds great 
promise for enhancing patient well-being and reducing healthcare expenses. 


2. METHOD 

The primary objective of this study is to enhance the classification performance of 12 classifiers 
through the integration of ASL and PSO techniques. A comprehensive evaluation of classifier performance 
was conducted using a combination of split tests and cross-validation. The training and test data were 
partitioned into different ratios, namely 70:30, 80:20, and 90:10, with a k-fold value of 10. To assess the 
effectiveness of the proposed model, data from HF patients were employed. By subjecting the classifiers to 
this dataset, the study aimed to improve their classification performance. 


2.1. Data preparation and processing 

For this study, a dataset comprising five distinct datasets from various sources was obtained from 
Kaggle (https://www.kaggle.com). These datasets include Cleveland (303 observations), Hungary (294 
observations), Switzerland (123 observations), Long Beach VA (200 observations), and a Stalog (liver) 
dataset (270 observations). The combined dataset consists of a total of 918 observations and encompasses 
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twelve variables, with eleven variables serving as inputs and one variable acting as the output (label). Each 
variable’s subset was tailored according to the specific requirements of the study. The subsequent section 
provides a comprehensive description of the variables utilized in the HF study. 

The study utilized a sample dataset (complete data can be accessed at https://shorturl.at/klvS2), as 
presented in Table 1, consisting of various parameters related to patients. These parameters include the age of 
the patient in years, the sex of the patient (M for male and F for female), the type of chest pain experienced 
(TA for typical angina, ATA for atypical angina, NAP for non-anginal pain, and ASY for asymptomatic), the 
resting blood pressure (RestingBP) in mm Hg, the serum cholesterol level in mm/dl, the fasting blood sugar 
(FastingBS) (1 if FastingBS > 120 mg/dl, O otherwise), the results of the resting electrocardiogram 
(RestingECG) (normal, ST for ST-T wave abnormality, and LVH for probable or definite left ventricular 
hypertrophy), the maximum heart rate achieved (MaxHR) (numeric value between 60 and 202), the presence 
of exercise-induced angina (Y for yes and N for no), the oldpeak value measured in depression, the slope of 
the peak exercise ST segment (up for upsloping, flat for flat, and down for downsloping), and the output class 
indicating the presence of heart disease (1 for HF and 0 for normal). 

The table represents data for predicting heart failure. It includes information about patients’ age, 
gender, chest pain type, resting blood pressure (RestingBP), cholesterol levels, fasting blood sugar 
(FastingBS), the results of the resting electrocardiogram (restingECG), the maximum heart rate achieved 
(MaxHR), exercise-induced angina, ST depression at exercise, ST slope, and heart disease condition. The 
data consists of 918 patients, where each row represents one patient’s information. This data can be used to 
build a predictive model that will help identify the risk factors associated with heart failure. Through 
analyzing this data, researchers can gain insights into patterns or correlations between the different variables 
that may contribute to the onset of heart failure. Ultimately, this data has tremendous potential to inform 
clinical decisions and improve patient outcomes. 


2.2. Proposed model architecture 

The proposed model architecture aims to enhance the accuracy of predicting HF by leveraging 
advanced ML techniques, such as PSO-based algorithms and supervised learning algorithms. Through the 
selection of pertinent features and optimization of model parameters, this approach enables more precise 
predictions, which can be instrumental for healthcare professionals in making informed decisions regarding 
patient care. To ensure the robustness of the proposed approach, a combination of split validation and 
cross-validation was implemented, utilizing different composition ratios of 70:30, 80:20, and 90:10, with a 
k-fold value of 10. To evaluate the effectiveness of the model, twelve classifiers were employed and grouped 
into five categories, namely DTM, SVM, NBCM, LRM, and LM. 

The confusion matrix and area under the receiver operating characteristic curve (AUC) are utilized 
for model evaluation in the classification task due to their ability to comprehensively assess the performance 
of the classification model [33], [34]. The confusion matrix allows for a detailed analysis of the model’s 
predictions compared to the actual labels, enabling an evaluation of its accuracy in classifying instances into 
different classes. Additionally, the selection of AUC serves to measure the overall performance of the 
classifier [35]. AUC represents the classifier’s capacity to distinguish between positive and negative 
examples at various classification thresholds. It provides a concise summary of the classifier’s performance 
in a single value, making it particularly valuable when adjusting the classification threshold based on specific 
applications or domains [36]. 


Table 1. Dataset 


No Age Sex Chest pain RestingBP Cholesterol FastingBS RestingECG MaxHR Exercise Oldpeak ST_slope Hear 
type angina disease 
1 40 M ATA 140 289 0 Normal 172 N 0 Up Normal 
2 49 F NAP 160 180 0 Normal 156 N 1 Flat HF 
3 37 M ATA 130 283 0 ST 98 N 0 Up Normal 
4 48 F ASY 138 214 0 Normal 108 Y 1.5 Flat HF 
5 54 M NAP 150 195 0 Normal 122 N 0 Up Normal 
6 39 M NAP 120 339 0 Normal 170 N 0 Up Normal 
7 45 F ATA 130 237 0 Normal 170 N 0 Up Normal 
8 54 M ATA 110 208 0 Normal 142 N 0 Up Normal 
914 45 M TA 110 264 0 Normal 132 N 1.2 Flat HF 
915 68 M ASY 144 193 1 Normal 141 N 3.4 Flat HF 
916 57 M ASY 130 131 0 Normal 115 Y 1.2 Flat HF 
917 57: F ATA 130 236 0 LVH 174 N 0 Flat HF 
918 38 M NAP 138 175 0 Normal 173 N 0 Up Normal 
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2.3. Model training and evaluation 

The implementation of ASL and PSO techniques for data mining classification in predicting HF 
involves several key steps. These steps, including data preparation, model selection, hyperparameter 
optimization, training, evaluation, and reporting, are crucial for constructing a precise and reliable 
classification model for HF prediction. It is worth noting that during the training phase, each model utilizes 
k-fold=10, a cross-validation technique. This ensures robustness and generalizability of the models. Figure 1 
provides a visual representation of these steps, highlighting their significance in the overall process. 

Furthermore, Table 2 provides a visual representation of the optimization technique utilized in the 
study, which is a combination of PSO and ASL. The pseudocode depicted in Table 2 outlines the step-by-step 
process of this combined optimization approach. This figure serves as a valuable reference point for 
understanding the methodology employed in the study and showcases the integration of PSO and ASL in the 
optimization process. 

The following is an explanation of the psedeucode from PSO and ASL where the ASL algorithm 
takes as input the training data T, the number of base classifiers B, the subspace size S, the learning rate 
alpha, and the number of iterations T. It aims to create an ensemble classifier model. The algorithm starts by 
initializing the base classifiers and their corresponding weights. For each base classifier, a random subspace 
of features is selected. The base classifier is trained using a subset of the training data with the selected 
features. The weight for each base classifier is calculated based on its classification error on a validation set. 
Next, the base classifiers are combined using weighted majority voting. For each test instance in the training 
data, the ensemble output vector Y is initialized to zero. Each base classifier classifies the test instance, and 
the weighted output is added to the ensemble output. The ensemble output vector is then normalized to obtain 
a probability distribution. The weights for the base classifiers are updated based on the error rate on this 
instance. The algorithm repeats this process for a specified number of iterations. Finally, the ensemble 
classifier model is returned. These algorithms will be compared with a standard classification model 
consisting of 11 classifiers. 

In simple terms, Table 2 is explained PSO and ASL algorithms can greatly improve the performance 
of classification in predicting heart failure. PSO algorithm can be used to select optimal features subset from 
the predict heart failure dataset, while ASL combines bagging and boosting techniques to form a more 
reliable ensemble classifier with diverse basis classifiers. The output from the ensemble classifier can then be 
used as input for the PSO algorithm to optimize the parameters in the classification model. By using these 
two algorithms together, the quality of the output from each basis classifier can be improved, and the most 
important features can be selected to form the feature subspace, resulting in a more accurate and reliable 
classification model for predicting heart failure. 


Preprocessing Dataset 
> Normalization (12 


- Dataset Testing 
Start attributes) 


Source: Dataset HF 
(https:/Avww.kaggie.comy 


Download Dataset HF 
(https//www.kaggle.com/ 


Model A: Standard 
Leaming Algorithm 
(11 classifiers) 


Model B: Optimization 
Leaming Algorithm 


' 


Split Dataset 
(70:30 - 80:20 - 90:10) 


' 


Dataset Training 


' 


Processed Testing 


' ’ 


=- Processed Training =- Validation - Learning 


4 4 


' 


Performance Report 


' 


(11 classifiers) + Model A Model B Model C 
Feature Selection Acc, F-Measure, Acc, F-Measure, Acc, F-Measure, 
(PSO) Sensitivity, Specificity, Sensitivity, Specificity, Sensitivity, Specificity, 
AUC AUC AUC 


Model C: Optimization 
Leaming Algorithm 
(11 classifiers) + 
Feature Selection 
(PSO) + ASL (Bagging 
and Boosting) 


’ 


The Best Learning 
Algorithm 


Figure 1. Proposed model 


Integration of PSO-based advanced supervised learning techniques for classification data ... (Mesran) 


80 m) ISSN: 1693-6930 


Table 2. Pseudocode combination algorithms 


Algorithm 1. PSO Algorithm 2. ASL (bagging + boosting) 
initialize population of particles input: training data T, number of base classifiers B, subspace size S, learning rate 
for each particle in population do: alpha, number of iterations T 
initialize particle position and velocity output: ensemble classifier model 
evaluate particle fitness for t = 1 to T do: 
update personal best position and fitness // Initialize base classifiers and weights 
end for for b = 1 to B do: 
// Randomly select subspace of features 
initialize global best position and fitness select S features at random 
repeat until termination condition is met do: // Train base classifier on subspace of features 
for each particle in population do: train base classifier using subset of T with selected features 
update particle velocity based on current // Calculate weight for each base classifier 
and previous positions calculate weight for base classifier based on classification error on validation 
update particle position set 
evaluate particle fitness end for 
if particle has better fitness than personal // Combine base classifiers using weighted majority voting 
best then: for each test instance in T do: 
update personal best position and initialize ensemble output vector Y to zero 
fitness for b = 1 to B do: 
end if // Classify instance using base classifier and add to ensemble output 
if particle has better fitness than global classify test instance using base classifier b and add weighted output to Y 
best then: end for 
update global best position and fitness // Normalize output vector to get probability distribution 
end if normalize Y 
end for // Update weights for base classifiers based on error rate on this instance 
end repeat update weight for each base classifier based on error rate on this instance 
end for 
end for 
// Return ensemble classifier model 
return ensemble model 


In addition to developing an accurate and robust model, it is also essential to evaluate the model’s 
accuracy in predicting HF. This is carried out through the confusion matrix and the receiver operating 
characteristics (ROC)/area under cover (AUC) curve. The ROC curve was created based on the values 
calculated from the confusion matrix, which compares the false positive rates (FPR) and the true positive 
rates (TPR). Where: 

a) FPR = False Positive/(False Positive + True Negative); 

b) TPR = True Positive/(True Positive + False Negative); 

Subsequently, BAD, if the resulting curve is close to the baseline line or the line that crosses from point 0.0. 
and GOOD, if the curve is close to 0.1 points. 


3. RESULT AND DISCUSSION. 

This section presents the experiments conducted and the results obtained for 12 standard model 
classifiers, PSO optimization, and PSO+ASL optimization. The aim is to compare and explore which model 
produced the best results for the HF classification. To evaluate these models, a combination of split and 
cross-validation was used with different compositions of 70:30, 80:20, and 90:10, of which k-fold=10. The 
model performance was evaluated using various metrics such as accuracy, f-size, sensitivity, specificity, and 
AUC. The evaluation steps for each model are summarized in Table 3, and the AUC values for each model 
are presented in Table 4. 

Across all 12 classifiers listed in Table 3, there was a noticeable improvement in performance for 
both PSO and PSO+ASL optimization models. Compared to the standard model, these optimization models 
showed an increase in accuracy ranging from 1% to 35%. Notably, k-NN showed a significant improvement 
in accuracy for all dataset ratios, with an increase of 27.99%. The AUC values for classifier models, as 
summarized in Table 4, also showed improvement ranging from 0.0204 to 0.077 compared to the standard 
model. In this case, k-NN achieved a “very good classification” with an AUC value of 1 for all dataset ratio 
compositions. 

From the Table 5, it can be observed that the combination of PSO and ASL yields better results in 
improving classification accuracy for some classifiers compared to using only PSO. In the 70:30 dataset split, 
significant improvements were observed for several classifiers such as decision tree, random forest, gradient 
boosted tree, and Naive Bayes (Kernel) when using PSO+ASL, while SVM (LibSVM) and k-NN did not 
show any significant changes. In the 80:20 dataset split, PSO+ASL provided better accuracy improvements 
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than using only PSO in all classifiers, with the most significant increase seen in random tree and k-NN. 
However, the results were less consistent in the 90:10 dataset split, with some classifiers showing 
improvements with PSO+ASL, such as SVM, Naïve Bayes (Kernel), and LR (SVM), while others such as 
decision tree, gradient boosted tree, and random tree showed a decrease in accuracy. Overall, the use of 
PSO+ASL algorithms can improve classification performance for some classifier types and dataset splits, but 
the appropriate algorithm should be chosen depending on the characteristics of the dataset used for predicting 
heart failure. The information provided in Table 5 can be effectively represented and understood through the 
graphical representation presented in Figure 2. 

The results obtained from grouping the classifiers, as depicted in Figure 3, reveal that the LM 
classifier achieved the highest average accuracy value of 100%. This corresponds to an average increase of 
31.9%, 25.1%, and 26.97% for the 70:30, 80:20, and 90:10 ratios, respectively, when compared to the 
standard model. For more detailed information regarding the average accuracy value per group, please refer 
to Table 6. 


Table 3. Evaluation for each classifier model (accuracy (%)) 


Classifiers 70:30 80:20 90:10 
Standard PSO  PSO+ASL Standard PSO  PSO+ASL Standard PSO  PSO+ASL 
DTM Decision tree 87.25 88.34 88.81 85.55 87.33 87.34 85.59 86.43 87.40 
Random forest 85.07 87.72 87.25 86.1 87.87 87.73 84.86 87.05 86.43 
Gradient boosted 84.45 87.42 100 86.38 88.69 100 86.08 87.77 100 
tree 
Random tree 70.44 82.10 92.53 74.55 82.56 91.96 79.17 81.49 91.65 
SVM SVM 86.94 87.87 87.87 85.83 86.77 87.60 85.47 87.05 87.29 
SVM (LibSVM) 72.63 81.65 83.36 72.21 85.83 81.88 72.41 79.66 85.59 
SVM (linear) 87.1 88.49 87.71 85.68 87.32 87.60 85.59 86.91 87.29 
NBCM Naïve Bayes 86.47 88.01 88.80 86.93 88.01 88.01 86.45 87.77 88.62 
Naïve Bayes 84.91 89.24 90.98 85.01 87.05 90.05 85.1 87.54 89.35 
(Kernel) 
LRM LR 86.47 88.80 88.34 86.51 87.21 87.74 86.2 87.30 87.53 
LR (SVM) 85.85 88.02 88.49 84.74 87.74 87.60 84.98 87.05 87.29 
Lazy K-NN 68.1 100 100 66.86 83.92 100 64.99 83.92 100 
Table 4. Evaluation for each classifier model (AUC) 
Classifiers 70:30 80:20 90:10 
Standard PSO  PSO+ASL Standard PSO  PSO+ASL Standard PSO  PSO+ASL 
DTM Decision tree 0.863 0.8830 0.9180 0.847 0.8690 0.9140 0.843 0.8490 0.9060 
Random forest 0.896 0.9080 0.9240 0.908 0.9130 0.9230 0.908 0.9150 0.9220 
Gradient 0.922 0.9240 1 0.93 0.9260 1 0.927 0.9240 1 
Boosted tree 
Randon tree 0.702 0.8180 0.9480 0.735 0.8360 0.9450 0.804 0.8250 0.9350 
SVM SVM 0.928 0.9290 0.9330 0.925 0.9250 0.9240 0.923 0.9180 0.9180 
SVM (LibSVM) 0.784 0.8960 0.8460 0.774 0.9120 0.8370 0.778 0.8540 0.8500 
SVM (linear) 0.923 0.9340 0.9320 0.921 0.9240 0.9240 0.922 0.9170 0.9180 
NBCM Naive Bayes 0.913 0.9260 0.9450 0.919 0.9250 0.9280 0.917 0.9230 0.9260 
Naive Bayes 0.898 0.9450 0.9700 0.905 0.9220 0.9620 0.907 0.9120 0.9520 
(Kernel) 
LRM LR 0.931 0.9360 0.9260 0.927 0.9290 0.9120 0.926 0.9260 0.9030 
LR (SVM) 0.932 0.9330 0.8900 0.925 0.9250 0.8830 0.924 0.9220 0.8710 
LM K-NN 0.5 0.5000 1 0.5 0.5000 1 0.5 0.5000 1 
Table 5. The improved average accuracy of each classifier 
Classifiers 70:30 80:20 90:10 
PSO _ PSO+ASL_ PSO  PSO+ASL PSO  PSO+ASL 
DTM Decision tree 1.09 1.56 1.78 1.79 0.84 1.81 
Random forest 2.65 2.18 ETT: 1.63 2.19 1.57 
Gradient boosted tree 2.97 15.55 2.31 13.62 1.69 13.92 
Random tree 11.66 22.09 8.01 17.41 2.32 12.48 
SVM SVM 0.93 0.93 0.94 1.77 1.58 1.82 
SVM (LibSVM) 9.02 10.73 13.62 9.67 7.25 13.18 
SVM (linear) 1.39 0.61 1.64 1.92 1.32 1.70 
NBCM Naïve Bayes 1.54 2.33 1.08 1.08 1.32 2.17 
Naive Bayes (Kernel) 4.33 6.07 2.04 5.04 2.44 4.25 
LRM LR 2.33 1.87 0.70 1.23 1.10 1.33 
LR (SVM) 2.17 2.64 3.00 2.86 2.07 2.31 
LM K-NN 31.90 31.90 17.06 33.14 18.93 35.01 
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Figure 2. The graph improved the average accuracy of each classifier (70:30, 80:20, 90:10) 
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Figure 3. The graph of the average value of accuracy by group (70:30, 80:20, 90:10) 


Table 6. The results of the average value of accuracy by group 


Classifiers 70:30 80:20 90:10 

Standard PSO _PSO+ASL Standard PSO _PSO+ASL Standard PSO _ PSO+ASL 
DTM 81.80 86.40 92.15 83.15 86.61 91.76 83.93 85.69 91.37 
SVMM 82.22 86.00 86.31 81.24 86.64 85.69 81.16 84.54 86.72 
NBCM 85.69 88.63 89.89 85.97 87.53 89.03 85.78 87.66 88.99 
LRM 86.16 88.41 88.42 85.63 87.48 87.67 85.59 87.18 87.41 
LM 68.10 100 100 66.86 83.92 100 64.99 83.92 100 
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The combination of PSO and ASL in the HF disease classification study demonstrated that the k-NN 
method outperformed all other classifiers across all dataset ratio compositions (70:30, 80:20, and 90:10 with 
k-fold=10). The analysis results are visually represented by the performance vector in Figure 4. Specifically, 
the true positive (TP) value, representing the number of true positives, is 287, indicating accurate prediction 
of HF disease classification. The false positive (FP) value, which represents the number of false positives, is 
0, indicating no instances of negative data being incorrectly classified as positive data (70:30 ratio). 
Similarly, for dataset ratios of 80:20 and 90:10, the true positive values are 328 and 369, respectively, 
indicating correct classification of positive data for HF disease. In both cases, the false positive value remains 


at 0, indicating accurate prediction of negative data. 
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Figure 4. Performance results of the KNN algorithm (70:30, 80:20, and 90:10) 
4. CONCLUSION 


The study on the integration of ASL and PSO techniques for classification data mining to predict HF 
has yielded promising results. The primary goal of the study was to enhance the accuracy of traditional ML 
algorithms in classifying HF patients based on various clinical characteristics. To achieve this, twelve 
classifiers were employed and categorized into five groups: DTM, SVM, NBCM, LRM, and LM. The 
parameters of these algorithms were optimized using ASL and PSO techniques, while a combination of split 
validation and cross-validation with composition ratios of 70:30, 80:20, and 90:10, along with a k-fold value 
of 10, was utilized. The results indicated that ASL and PSO techniques outperformed the conventional ML 
algorithms in terms of accuracy and AUC. However, it is important to note that the study had certain 
limitations, such as a small sample size and the absence of external validation, which warrant further 
investigation to assess the effectiveness of ASL and PSO techniques in a broader patient population. In 
conclusion, this research demonstrates that the utilization of PSO-based ASL techniques for classification data 
mining holds significant implications for clinical practice and improved patient outcomes in predicting HF. 


REFERENCES 


[1] S. Q. Duong et al., “Identification of patients at risk of new onset heart failure: Utilizing a large statewide health information 
exchange to train and validate a risk prediction model,’ PLoS One, vol. 16, no. 12 December, pp. 1-13, 2021, doi: 
10.137 1/journal.pone.0260885. 

[2] S. Saepudin, P. Ball, and H. Morrissey, “Development of prediction model for identifying heart failure patients with high risk of 
developing hyponatremia,” J. Kedokt. dan  Kesehat. Indones., vol. 10, no. 2, pp. 121-131, 2019, doi: 
10.20885/jkki.vol10.iss2.art4. 

[3] E. E. Tripoliti, T. G. Papadopoulos, G. S. Karanasiou, K. K. Naka, and D. I. Fotiadis, “Heart Failure: Diagnosis, Severity 
Estimation and Prediction of Adverse Events Through Machine Learning Techniques,” Comput. Struct. Biotechnol. J., vol. 15, pp. 
26-47, 2017, doi: 10.1016/j.csbj.2016.11.001. 

[4] A. W. Sugiyarto, A. M. Abadi, and Sumarna, “Classification of heart disease based on PCG signal using CNN,” Telkomnika 
(Telecommunication Comput. Electron. Control., vol. 19, no. Dy pp. 1697-1706, 2021, doi: 
10.12928/TELKOMNIKA.v19i5.20486. 


Integration of PSO-based advanced supervised learning techniques for classification data ... (Mesran) 


[6] 
[7] 
[8] 
[9] 


[10] 
[11] 
[12] 
[13] 
[14] 


[15] 


[16] 


[17] 
[18] 


[19] 


[20] 
[21] 
[22] 
[23] 
[24] 
[25] 
[26] 
[27] 
[28] 
[29] 
[30] 
[31] 
[32] 
[33] 
[34] 


[35] 


[36] 


g ISSN: 1693-6930 


T. N. Nguyen, T. H. Nguyen, and V. T. Ngo, “Artifact elimination in ECG signal using wavelet transform,” Telkomnika 
(Telecommunication Comput. Electron. Control., vol. 18, no. 2, pp. 936—944, 2020, doi: 10.12928/TELKOMNIKA. V1812.14403. 
Yajuan Wang et al., “Early Detection of Heart Failure with Varying Prediction Windows by Structured and Unstructured Data in 
Electronic Health Records,” HHS Public Access, vol. 176, no. 1, pp. 139—148, 2018, doi: 10.1109/EMBC.2015.7318907.Early. 

T. Chen, S. Zhao, S. Shao, and S. Zheng, “Non-invasive diagnosis methods of coronary disease based on wavelet denoising and 
sound analyzing,” Saudi J. Biol. Sci., vol. 24, no. 3, pp. 526-536, 2017, doi: 10.1016/j.sjbs.2017.01.023. 

A. F. AlOthman, A. R. W. Sait, and T. A. Alhussain, “Detecting Coronary Artery Disease from Computed Tomography Images 
Using a Deep Learning Technique,” Diagnostics, vol. 12, no. 9, 2022, doi: 10.3390/diagnostics 12092073. 

A. P. Windarto and T. Herawan, Decision Support System on Determination of Contraception Tools as an Effort to Suppress the 
Number of Growth Ratios in Indonesia, vol. 730, Springer 1 Nature Singapore Pte Ltd, 2021. doi: https://doi.org/10.1007/978- 
981-33-4597-3_69. 

A. P. Windarto and T. Herawan, K-Means Algorithm with Rapidminer in Clustering School Participation Rate in Indonesia. 
Springer 1 Nature Singapore Pte Ltd, 2021. doi: https://doi.org/10.1007/978-98 1 -33-4597-3_70. 

A. Al Bataineh and S. Manacek, “MLP-PSO Hybrid Algorithm for Heart Disease Prediction,” J. Pers. Med., vol. 12, no. 8, 2022, 
doi: 10.3390/jpm 12081208. 

I. D. Mienye and Y. Sun, “Improved heart disease prediction using particle swarm optimization based stacked sparse 
autoencoder,” Electron., vol. 10, no. 19, 2021, doi: 10.3390/electronics10192347. 

S. I. Novichasari and I. S. Wibisono, “Particle Swarm Optimization For Improved Accuracy of Disease Diagnosis,” J. Appl. Intell. 
Syst., vol. 5, no. 2, pp. 57-68, 2020. 

C. Krittanawong et al., “Integration of novel monitoring devices with machine learning technology for scalable cardiovascular 
management,” Nat. Rev. Cardiol., vol. 18, no. 2, pp. 75—91, 2021, doi: 10.1038/s41569-020-00445-9. 

A. Javeed, S. U. Khan, L. Ali, S. Ali, Y. Imrana, and A. Rahman, “Machine Learning-Based Automated Diagnostic Systems 
Developed for Heart Failure Prediction Using Different Types of Data Modalities: A Systematic Review and Future Directions,” 
Comput. Math. Methods Med., vol. 2022, 2022, doi: 10.1155/2022/9288452. 

A. Guo, M. Pasque, F. Loh, D. L. Mann, and P. R. O. Payne, “Heart Failure Diagnosis, Readmission, and Mortality Prediction 
Using Machine Learning and Artificial Intelligence Models,” Curr. Epidemiol. Reports, vol. 7, no. 4, pp. 212-219, 2020, doi: 
10.1007/s4047 1-020-00259-w. 

D. J. Choi, J. J. Park, T. Ali, and S. Lee, “Artificial intelligence for the diagnosis of heart failure,” npj Digit. Med., vol. 3, no. 1, 
2020, doi: 10.1038/s41746-020-0261-3. 

D. K. Plati et al., “A machine learning approach for chronic heart failure diagnosis,” Diagnostics, vol. 11, no. 10, pp. 1-15, 2021, 
doi: 10.3390/diagnostics 11101863. 

S. Kordnoori, H. Mostafaei, M. Rostamy-Malkhalifeh, and M. Ostadrahimi, “Diagnosis of Heart Disease Using Feature Selection 
Methods Based On Recurrent Fuzzy Neural Networks,’ IPTEK J. Technol. Sci., vol. 32, no. 2, p. 64, 2021, doi: 
10.12962/)20882033.v32i2.7075. 

Q. Bai, C. Su, W. Tang, and Y. Li, “Machine learning to predict end stage kidney disease in chronic kidney disease,” Sci. Rep., 
vol. 12, no. 1, pp. 1-8, 2022, doi: 10.1038/s41598-022-12316-z. 

M. E. Grams ef al., “Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased 
glomerular filtration rate,” Kidney Int., vol. 93, no. 6, pp. 1442-1451, 2018, doi: 10.1016/j.kint.2018.01.009. 

E. Dovgan et al., “Using machine learning models to predict the initiation of renal replacement therapy among chronic kidney 
disease patients,” PLoS One, vol. 15, no. 6, pp. 1-18, 2020, doi: 10.137 1/journal.pone.0233976. 

C. L. Ramspek et al., “Predicting Kidney Failure, Cardiovascular Disease and Death in Advanced CKD Patients,” Kidney Int. 
Reports, vol. 7, no. 10, pp. 2230-2241, 2022, doi: 10.1016/j.ekir.2022.07.165. 

M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthc. Anal., 
vol. 3, no. December 2022, p. 100130, 2023, doi: 10.1016/j.health.2022. 100130. 

M. Q. Syafi, “Increasing Accuracy of Heart Disease Classification on C4.5 Algorithm Based on Information Gain Ratio and 
Particle Swarm Optimization Using Adaboost Ensemble,” J. Adv. Inf. Syst. Technol., vol. 4, no. 1, pp. 100-112, 2022. 

M. K. Iliyas and I. S. Shaikh, “Prediction of Heart Disease Using Decision Tree,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 
6, no. 3, pp. 530-532, 2016. 

E. Owusu, P. Boakye-Sekyerehene, J. K. Appati, and J. Y. Ludu, “Computer-Aided Diagnostics of Heart Disease Risk Prediction 
Using Boosting Support Vector Machine,” Comput. Intell. Neurosci., vol. 2021, 2021, doi: 10.1155/2021/3152618. 

V. S. K. Reddy, P. Meghana, N. V. S. Reddy, and B. A. Rao, “Prediction on Cardiovascular disease using Decision tree and Naive 
Bayes classifiers,” J. Phys. Conf. Ser., vol. 2161, no. 1, 2022, doi: 10.1088/1742-6596/2161/1/012015. 

S. Jabbedari Khiabani, A. Batani, and E. Khanmohammadi, “A hybrid decision support system for heart failure diagnosis using 
neural networks and statistical process control,” Healthc. Anal., vol. 2, p. 100110, 2022, doi: 10.1016/j-health.2022.100110. 

M. Yuvalı, B. Yaman, and O. Tosun, “Classification Comparison of Machine Learning Algorithms Using Two Independent CAD 
Datasets,” Mathematics, vol. 10, no. 3, 2022, doi: 10.3390/math10030311. 

R. zaib and O. Ourabah, “Large Scale Data Using K-Means,” Mesopotamian J. Big Data, pp. 38-47, 2023, doi: 
10.58496/mjbd/2023/006. 

M. Alajanbi, D. Malerba, and H. Liu, “Distributed Reduced Convolution Neural Networks,” Mesopotamian J. Big Data, pp. 26— 
29, 2021, doi: 10.58496/mjbd/202 1/005. 

A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, vol. 17, no. 1, pp. 168-192, 2018, doi: 
10.1016/j.aci.2018.08.003. 

P. H. Kasani, J. E. Lee, C. Park, C. H. Yun, J. W. Jang, and S. A. Lee, “Evaluation of nutritional status and clinical depression 
classification using an explainable machine learning method,” Front. Nutr., vol. 10, 2023, doi: 10.3389/fnut.2023.1165854. 

I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction 
Method and Its Application on Net Promoter Score Classification Problem,” Technologies, vol. 9, no. 4, 2021, doi: 
10.3390/technologies904008 1. 

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary 
classification evaluation,’ BMC Genomics, vol. 21, no. 1, pp. 1-13, 2020, doi: 10.1186/s12864-019-6413-7. 


TELKOMNIKA Telecommun Comput El Control, Vol. 22, No. 1, February 2024: 76-85 


TELKOMNIKA Telecommun Comput El Control Oo 85 
BIOGRAPHIES OF AUTHORS 


Mesran ® fJ I> the author was born in Medan on August 24, 1978, he completed his 
master’s degree in Computer Science in 2008 at Universitas Putra Indonesia. Currently, he is 
actively teaching at STMIK Budi Darma since 2005 as a permanent lecturer in the Informatics 
Engineering program. He can be contacted at email: mesran.skom.mkom@ gmail.com. 


Remuz Mb Kmurawak Ô EJ EF © the author is a passionate and enthusiastic lecturer who 
approaches their work with care and dedication, aiming to inspire and motivate students to 
achieve academic success. Their objective extends beyond imparting knowledge; they strive to 
foster the development of high-caliber students equipped with practical learning skills. The 
author demonstrates excellent communication skills, recognizing its vital role in promoting 
teamwork and attaining clear objectives. With a master’s degree in Information Technology, 
the author possesses a solid foundation in data analysis and extensive experience as a lecturer 
in the Information System Department at Cenderawasih University. Additionally, they actively 
participate in the Papuan Information and Communication Technology Council, showcasing 
their commitment to the field. In the past, the author worked as a Data Analyst at PT Probindo 
Artika Jaya from 2008 to 2011, further enhancing their expertise and practical application of 
knowledge. He can be contacted at email at remuzbertho3 @ gmail.com. 


Agus Perdana Windarto © ki I> the author was born in Pematangsiantar on August 
30th, 1986. They completed their master’s degree in Computer Science in 2014 at Universitas 
Putra Indonesia ‘YPTK’ Padang, and are currently pursuing their doctorate (Ph.D.) degree at 
the same university. The author has been an active lecturer at STIKOM Tunas Bangsa since 
2012, teaching in the Information Systems program. Their research focuses on artificial 
intelligence (decision support systems, expert systems, data mining, neural networks, fuzzy 
logic, deep learning, and genetic algorithms). Additionally, the author has served as a reviewer 
for various nationally accredited journals (SINTA 2 - SINTA 6) and manages a community 
called “Pemburu Jurnal” at STIKOM Tunas Bangsa. They have won multiple research grant 
proposals from DIKTI (twice in 2018-2019), DIKTI Community Service Grant (once in 2019), 
PKM-P Grant (as a student advisor in 2018), and PKM-AI Grant (as a student advisor in 
2019). The author is also part of the Relawan Jurnal Indonesia (RJI) community in North 
Sumatra, the Data Science Indonesia Researchers Association (PDSI), the Forum of Higher 
Education Communities (FKPT), and is a co-founder of the Yayasan Adwitiya Basurata 
Inovasi (Yayasan Abivasi) foundation with fellow professors. He can be contacted at email: 
agus.perdana @amiktunasbangsa.ac.id or agus.perdana@ abivasi.id. 


Integration of PSO-based advanced supervised learning techniques for classification data ... (Mesran) 


